Given data points 𝐚1,…,𝐚n∈ℝd\mathbf{a}_1,…,\mathbf{a}_n \in \mathbb{R}^d, find centers 𝛍1,...,𝛍k∈ℝd\mathbf{\mu}_1,...,\mathbf{\mu}_k \in \mathbb{R}^d to minimize: Cost(𝛍1,...,𝛍k)=∑i=1nminj=1,...,k||𝛍j−𝐚i||22Cost(\mathbf{\mu}_1,...,\mathbf{\mu}_k)=\sum_{i=1}^n \min_{j=1,...,k} ||\mathbf{\mu}_j-\mathbf{a}_i||_2^2
Equivalent form: find clusters C1,...,Ck⊆{1,...,n}C_1,...,C_k \subseteq \{1,...,n\} to minimize: Cost(C1,...,Ck)=∑i=1n12|Cj|∑u,v∈Cj||𝐚u−𝐚v||22Cost(C_1,...,C_k)=\sum_{i=1}^n \frac{1}{2|C_j|}\sum_{u,v \in C_j}||\mathbf{a}_u-\mathbf{a}_v||_2^2 Approximation scheme:
#incomplete